System and method for retiring approximately simultaneously a group of instructions in a superscalar microprocessor

ABSTRACT

An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results. In addition, the retirement control block further controls the retiring of a group of instructions determined to be retirable, by simultaneously transferring their results from the temporary buffer to the register array, and retires instructions executed in order by storing their results directly in the register array. The method comprises the steps of monitoring the status of the instructions to determine which group of instructions have been executed, determining whether each executed instruction is retirable, storing results of instructions executed out of program order in a temporary buffer, storing retirable-instruction results in a register array and retiring a group of retirable instructions by simultaneously transferring their results from the temporary buffer to the register array, and retiring instructions executed in order by storing their results directly in the register array.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of application Ser. No.09/631,640,filed Aug. 2, 2000, presently allowed, which is a continuation ofapplication Ser. No. 09/009,412, filed Jan. 20, 1998, now U.S. Pat. No.6,131,157, which is a continuation of application Ser. No. 08/481,146filed Jun. 7, 1995, now U.S. Pat. No. 5,826,055, which is a continuationof application Ser. No.07/877,451, filed May 1, 1992, now abandoned.

The following patents are related to the subject matter of the presentapplication and are incorporate by reference in their entirety herein:“Superscalar RISC Instruction Scheduling,” U.S. Pat. No. 5,497,499,issued Mar. 5, 1996; and “High Performance, Superscalar-Based ComputerSystem with Out-of-Order Instruction Execution,” U.S. Pat. No.5,539,911, issued Jul. 23, 1996.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to the field of superscalarprocessors, and more particularly, to a system and method for retiringmultiple instructions executed out-of-order in a superscalar processor.

2. Discussion of Related Art

One method of increasing performance of microprocessor-based systems isoverlapping the steps of different instructions using a technique calledpipelining. In pipelining operations, various steps of instructionexecution (e.g. fetch, decode and execute) are performed by independentunits called pipeline stages. The steps are performed in parallel in thevarious pipeline stages so that the processor can handle more than oneinstruction at a time.

As a result of pipelining, processor-based systems are typically able toexecute more than one instruction per clock cycle. This practice allowsthe rate of instruction execution to exceed the clock rate. Processorsthat issue, or initiate execution of, multiple independent instructionsper clock cycle are known as superscalar processors. A superscalarprocessor reduces the average number of cycles per instruction beyondwhat is possible in ordinary pipelining systems.

In a superscalar system, the hardware can execute a small number ofindependent instructions in a single clock cycle. Multiple instructionscan be executed in a single cycle as long as there are no datadependencies, procedural dependencies, or resource conflicts. When suchdependencies or conflicts exist, only the first instruction in asequence can be executed. As a result, a plurality of functional unitsin a superscalar architecture cannot be fully utilized.

To better utilize a superscalar architecture, processor designers haveenhanced processor look-ahead capabilities; that is the ability of theprocessor to examine instructions beyond the current point of executionin an attempt to find independent instructions for immediate execution.For example, if an instruction dependency or resource conflict inhibitsinstruction execution, a processor with look-ahead capabilities can lookbeyond the present instruction, locate an independent instruction, andexecute it.

As a result, more efficient processors, when executing instructions, putless emphasis on the order in which instructions are fetched and moreemphasis on the order in which they are executed. As a further result,instructions are executed out of order.

For a more in-depth discussion of superscalar processors, see Johnson,Superscalar Microprocessor Design, Prentice Hall, Inc. (1991).

Scenarios occur whereby the execution of the instructions is interruptedor altered, and the execution must be restarted in the correct order.Two such scenarios will be described.

In a first scenario, during look-ahead operations, many processordesigns employ predictive techniques to predict a branch that theprogram is going to follow in that particular execution. In thesesystems, the instructions fetched and executed as a result of look-aheadoperations are instructions from the branch of code that was predicted.High instructing throughput is achieved by fetching and issuinginstructions under the assumption that branches chosen are predictedcorrectly and that exceptions do not occur. This technique, known asspeculative execution, allows instruction execution to proceed withoutwaiting for the completion of previous instructions. In other words,execution of the branch target instruction stream begins before it isdetermined whether the conditional branch will be taken.

Since the branch prediction occasionally fails, the processor mustprovide recovery mechanisms for canceling the effects of instructionsthat were speculatively executed. The processor must also providerestart mechanisms to reestablish the correct instruction sequence.

In a second scenario, out-of-order completion makes it difficult to dealwith exceptions. Exceptions are created by instructions when theinstruction cannot be properly executed by hardware alone. Theseexceptions are commonly handled by interrupts, permitting a softwareroutine to correct the situation. Once the routine is completed, theexecution of the interrupted program must be restarted so it cancontinue as before the exception.

Processors contains information that must be saved for a program to besuspended and then restored for execution to continue. This informationis known as the “state” of the processor. The state informationtypically includes a program counter (PC), an interrupt address register(IAR), and a program status register (PSR); the PSR contains statusflags such as interrupt enable, condition codes, and so forth.

As program instructions are executed, the state machine is updated basedon the instructions. When execution is halted and must later berestarted (i.e., one of the two above scenarios occurs) the processorlooks to the state machine for information on how to restart execution.In superscalar processors, recovery and restart occur frequently andmust be accomplished rapidly.

In some conventional systems, when instructions are executed out oforder, the state of the machine is updated out of order (i.e., in thesame order as the instructions were executed). Consequently, when theprocessor goes back to restart the execution, the state of the machinehas to be “undone” to put it back in a condition such that execution maybegin again.

To understand conventional systems, it is helpful to understand somecommon terminology. An in-order state is made up of the most recentinstruction result assignments resulting from a continuous sequence ofexecuted instructions. Assignments made by instructions completedout-of-order where previous instruction(s) have not been completed, arenot included in this state.

If an instruction is completed and all previous instructions have alsobeen completed, the instruction's results can be stored in the in-orderstate. When instructions are stored in the in-order state, the machinenever has to access results from previous instructions and theinstruction is considered “retired.”

A look-ahead state is made up of all future assignments, completed anduncompleted, beginning with the first uncompleted instruction. Sincethere are completed and uncompleted instructions, the look-ahead statecontains actual as well as pending register values.

Finally, an architectural state is made up of the most recentlycompleted assignment of the continuous string of completed instructionsand all pending assignments to each register. Subsequent instructionsexecuted out of order must access the architectural state to determinewhat state the register would be in had the instruction been executed inorder.

One method used in conventional systems to recover from misdirectedbranches and exceptions is known as checkpoint repair. In checkpointrepair, the processor provides a set of logical spaces, only one ofwhich is used for current execution. The other logical spaces containbackup copies of the in-order state, each corresponding to a previouspoint in execution. During execution, a checkpoint is made by copyingthe current architectural state to a backup space. At this time, theoldest backup state is discarded. The checkpoint is updated asinstructions are executed until an in-order state is reached. If anexception occurs, all previous instructions are allowed to execute, thusbringing the checkpoint to the in-order state.

To minimize the amount of required overhead. checkpoints are not made atevery instruction. When an exception occurs, restarting is accomplishedby loading the contents of the checkpointed state preceding the point ofexception, and then executing the instructions in order up to the pointof exception. For branch misprediction recovery, checkpoints are made atever) branch and contain the precise state at which to restart executionimmediately.

The disadvantage of checkpoint repair is that it requires a tremendousamount of storage for the logical spaces. This storage overhead requiresadditional chip real estate which is a valuable and limited resource inthe microprocessor.

Other conventional systems use history buffers to store old states thathave been superseded by new states. In this architecture, a registerbuffer contains the architectural state. The history buffer is a last-infirst-out (LIFO) stack containing items in the in-order state supersededby look-ahead values (i.e., old values that have been replaced by newvalues), hence the term “history.”

The current value (prior to decode) of the instruction's destinationregister is pushed onto the stack. The value at the bottom of the stackis discarded if its associated instruction has been completed. When anexception occurs, the processor suspends decoding and waits until allother pending instructions are completed, and updates the register fileaccordingly. All values are then popped from the history buffer in LIFOorder and written back into the register file. The register file is nowat the in-order state at the point of exception.

The disadvantage associated with the history buffer technique is thatseveral clock cycles are required to restore the in-order state.

Still other conventional systems use a reorder buffer managed as afirst-in first-out (FIFO) queue to restart after exceptions andmispredictions. The reorder buffer contains the look-ahead state, and aregister file contains the in-order state. These two can be combined todetermine the architectural state. When an instruction is decoded, it isassigned an entry at the top of the reorder buffer. When the instructioncompletes, the result value is written to the allocated entry. When thevalue reaches the bottom of the buffer, it is written into the registerfile if there are no exceptions. If the instruction is not complete whenit reaches the bottom, the reorder buffer does not advance until theinstruction completes. When an exception occurs, the reorder buffer isdiscarded and the in-order state is accessed.

The disadvantage of this technique is that it requires associativelookup to combine the in-order and look-ahead states. Furthermore,associative lookup is not straightforward since it must determine themost recent assignments if there is more than one assignment to a givenregister. This requires that the reorder buffer be implemented as a trueFIFO, rather than a more simple, circularly addressed register array.

What is needed then is a system and method for maintaining a currentstate of the machine and for efficiently updating system registers basedon the results of instructions executed out of order. This system andmethod should use a minimum of chip real estate and power and shouldprovide quick recovery of the state of the machine up to the point of anexception. Furthermore, the system should not require complex steps ofassociative lookup to obtain the most recent value of a register.

SUMMARY OF THE INVENTION

The present invention is a system and method for retiring instructionsissued out of order in a superscalar microprocessor system. According tothe technique of the present invention, results of instructions executedout of order are first stored in a temporary buffer until all previousinstructions have been executed. Once all previous instructions havebeen executed and their results stored in order in a register array, theresults of the instruction in question can be written to the registerarray and the instruction is considered retired.

The register array contains the current state of the machine. Tomaintain the integrity of register array data, only results ofinstructions are not written to the register array until the results ofall previous instructions have been written. In this manner, the stateof the machine is updated in order, and situations such as exceptionsand branch mispredictions can be handled quickly and efficiently.

The present invention comprises means for assigning and writinginstruction results to a temporary storage location, transferringresults from temporary storage to the register array so that theregister array is updated in an in-order fashion and accessing resultsin the register array and temporary storage for subsequent operations.

Further features and advantages of the present invention, as well as thestructure and operation of various embodiments of the present invention,are described in detail below with reference to the accompanyingdrawings.

BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES

FIG. 1 is a data path diagram of a superscalar instruction executionunit.

FIG. 2 is a block diagram illustrating the functions of the superscalarinstruction execution unit.

FIG. 3 is a diagram further illustrating the instruction FIFO and theinstruction window.

FIG. 4 is a diagram illustrating instruction retirement according to thepresent invention.

FIG. 5A shows the configuration of an instruction window.

FIG. 5B is a diagram illustrating the assignment of instruction resultsto storage locations in a temporary buffer according to the presentinvention.

FIG. 6A is a timing diagram illustrating data writing to a registerarray according to the present invention.

FIG. 6B is a timing diagram illustrating writing results to fourregister locations per clock cycle according to the present invention.

In the drawings, like reference numbers indicate identical orfunctionally similar elements. Additionally, the left-most digit of areference number identifies the drawing in which the reference numberfirst appears.

DETAILED DESCRIPTION OF THE INVENTION

1. Overview

The present invention provides a system and a method for retiringcompleted instructions such that to the program it appears that theinstructions are executed sequentially in the original program order.The technique of the present invention is to store all out-of-orderinstruction results (results of instructions not executed in the programorder) in a temporary buffer until all previous instructions arecomplete without any exceptions. The results are then transferred fromthe temporary buffer to a register array which represents the officialstate.

When an instruction is retired, all previous instructions have beencompleted and the retired instruction is officially completed. Wheninstructions are retired according to the technique of the presentinvention, the state of the machine is updated in order. Therefore, whenan exception occurs, out-of-order execution is suspended and alluncompleted instructions prior to the exception are executed andretired. Thus, the state of the machine is up to date as of the time ofthe exception. When the exception is complete, out-of-order executionresumes from the point of exception. When a branch misprediction isdetected, all instructions prior to the branch are executed and retired,the state of the machine is now current, and the machine can restart atthat point. All results residing in the temporary buffer frominstructions on the improper branch are ignored. As new instructionsfrom the correct branch are executed, their results are written into thetemporary buffer, overwriting any results obtained from thespeculatively executed instruction stream.

2. Environment

FIG. 1 illustrates a block diagram of a superscalar InstructionExecution Unit (IEU) capable of out-of-order instruction issuing.Referring to FIG. 1, there are two multi-ported register files 102A,102B which hold general purpose registers. Each register file 102provides five read ports and two write ports. Each write port allows twowrites per cycle. In general, register file 102A holds only integer datawhile register file 102B can hold both floating point and integer data.

Functional units 104 are provided to perform processing functions. Inthis example, functional units 104 are three arithmetic logic units(ALUs) 104A, a shifter 104B, a floating-point ALU 104C, and afloating-point multiplier 104D. Floating-point ALU 104C andfloating-point multiplier 104D can execute both integer andfloating-point operations.

Bypass multiplexers 106 allow the output of any functional unit 104 tobe used as an input to any functional unit 104. This technique is usedwhen the results of an instruction executed in one clock cycle areneeded for the execution of another instruction in the next clock cycle.Using bypass multiplexers 106, the result needed can be input directlyto the appropriate functional unit 104. The instruction requiring thoseresults can be issued on that same clock cycle. Without bypassmultiplexers 106, the results of the executed instruction would have tobe written to register file 102 on one clock cycle and then be output tothe functional unit 104 on the next clock cycle. Thus, without bypassmultiplexers 106 one full clock cycle is lost. This technique, alsoknown as forwarding, is well known in the art and is more fullydescribed in Hennessy et al., Computer Architecture a QuantitativeApproach, Morgan Kaufmann Publishers (1990) on pages 260-262.

Selection multiplexers 108 provide a means for selecting the resultsfrom functional units 104 to be written to register files 102.

FIG. 2 illustrates a block diagram of IEU control logic 200 for the IEUshown in FIG. 1. IEU control logic 200 includes an instruction window202. Instruction window 202 defines the instructions which IEU controllogic 200 may issue during one clock cycle. Instruction window 202represents the bottom two locations in an instruction buffer, which is aFIFO register containing instructions to be executed. This instructionbuffer is also referred to as an instruction FIFO. As instructions arecompleted, there are flushed out at the bottom and new, instructions aredropped in at the top. The bottom location of instruction window 202 isreferred to as bucket 0 and the top location of instruction window 202is referred to as bucket 1.

When all four instructions in bucket 0 have been retired, they areflushed out of bucket 0, the instructions in bucket 1 drop into bucket 0and a new group of four instructions drops into bucket 1. Instructionwindow 202 may be implemented using a variety of techniques. One suchtechnique is fully described in U.S. Pat. No. 5,497,499, entitled“Superscalar RISC Instruction Scheduling” and issued Mar. 5, 1996, thedisclosure of which is incorporated herein by reference.

In the current example, instruction window 202 contains eightinstructions. Therefore, IEU control logic 200 tries to issue a maximumnumber of instructions from among these eight during each clock cycle.Instruction decoding occurs in decoders 203. Instruction decoding is anongoing process performed in IEU control logic 200. Instructions must bedecoded before dependency checking (discussed below), issuing andexecution occur.

IEU control logic 200 also contains register renaming circuitry (RRC)204 which performs two related functions. The first function performedis data dependency checking. Once data dependency checking is complete,RRC 204 assigns tags to each instruction which are used to track thelocation of instruction operands and results.

Data dependency checking logic, residing in RRC 204, is used forchecking instructions for dependencies. In checking for dependencies,the data dependency checking logic looks at the various register filesource and destination addresses to determine whether one or moreprevious instructions must be executed before a subsequent instructionmay be executed. FIG. 3 further illustrates instruction window 202 andthe instruction FIFO. Referring to FIG. 3, various register file sourceand destination addresses 302 of the instruction I0 must be checkedagainst the source and destination addresses of all other instructions.

Referring back to FIG. 2, since instruction window 202 in this examplecan contain 8 instructions, the IEU can look at eight instructions forscheduling purposes. All source register addresses must be compared withall previous destination addresses. If one instruction is dependent uponcompletion of a previous instruction, these two instructions cannot becompleted out of order. In other words, if instruction I2 requires theresults of instruction I1, a dependency exists and I1 must be executedbefore I2. Some instructions may be long-word instructions, whichrequire extra care when checking for dependencies. For long-wordinstructions, the instructions occupy two registers both of which mustbe checked when examining this instruction for dependencies.

An additional function performed in RRC 204 is tag assignment. Propertag assignment is crucial to effective instruction retirement accordingto the present invention. Each instruction in instruction window 202 isassigned a tag based on its location in instruction window 202, andbased on the results of data dependency checking discussed above. Thetag assigned to each instruction indicates where in a temporary bufferthat instruction's results are to be stored until that instruction isretired and whether all of the previous instructions on which thatinstruction is dependent have been completed. Tag assignment and thetemporary buffer are discussed in more detail below.

A further function performed by IEU control logic 200 is determiningwhich instructions are ready for issuing. An instruction issuer 208issues instructions to the appropriate functional unit 104 forexecution. Circuitry within RRC 204 determines which instructions ininstruction window 202 are ready for issuing and sends a bit map toinstruction issuer 208 indicating which instructions are ready forissuing. Instruction decode logic 203 indicates the resource requirementfor each instruction. Issuer 208 also receives information fromfunctional units 104 concerning resource availability. This informationis scanned by issuer 208 and an instruction is selected for issuing.

Instruction issuer 208 sends a control signal 209 to multiplexers 210telling them which instruction to send to functional units 104.Instruction issuer 208 also sends a control signal 211 to multiplexer212 configuring it to send the appropriate register address to configurethe register that is to receive the results of the instruction.Depending on the availability of functional units 104, issuer 208 mayissue multiple instructions each clock cycle.

Referring again to FIGS. 1 and 2, once an instruction is issued tofunctional units 104 and executed by the same, register files 102A and102B must be updated to reflect the current state of the machine. Whenthe machine has to “go back” and restart an execution because of anexception or a branch misprediction, the state of the machine mustreflect the up-to-date state at the time the exception or branchoccurred. Even when instructions are issued and executed out of order,the state of the machine must still reflect, or be recoverable to, thecurrent state at the time of exception or branching.

The Instruction Retirement Unit (IRU) of the present invention, retiresthe instructions as if they were executed in order. In this manner, thestate of the machine is updated, in order, to the point of the mostrecent instruction in a sequence of completed instructions.

The present invention provides a unique system and method for retiringinstructions and updating the state of the machine such that when arestart is required due to an exception or a branch misprediction, thecurrent state up to that point is recoverable without needing to waitfor the register file to be rebuilt or reconstructed to negate theeffects of out-of-order executions.

3. Implementations

FIG. 4 illustrates a high-level diagram of an Instruction RetirementUnit 400 (referred to as “IRU 400”) of the present invention. IRU 400and its functions are primarily contained within register file 102 and aretirement control block (RCB) 409. As shown in FIG. 4, the functionsperformed by the environment are also critical to proper instructionretirement.

Referring to FIG. 4, the operation of IRU 400 will now be described. Asdiscussed in subsection 2 of this application, the instructions executedin the superscalar processor environment are executed out of order, andthe out-of-order results cannot be written to the registers until allprevious instructions'

results are written in order. A register array 404 represents thein-order state of the machine. The results of all instructions completedwithout exceptions, who also have no previous uncompleted instructions,are stored in register array 404. Once the results are stored inregister array 404, the instruction responsible for those results isconsidered “retired.”

If an instruction is completed out of order, and there are previousinstructions that have not been completed, the results of thatinstruction are temporarily stored in a temporary buffer 403. Once allinstructions previous to the instruction in question have been executedand their results transferred to register array 404, the instruction inquestion is retirable, and its results can be transferred from temporarybuffer 403 to register array 404. Once this is done, the instruction isconsidered retired. A retirable instruction then, is an instruction forwhich two conditions have been met: (1) it is completed, and (2) thereare no unexecuted instructions appearing earlier in the program order.

If the results of an executed instruction are required by a subsequentinstruction, those results will be made available to the appropriatefunctional unit 104 regardless of whether they are in temporary buffer403 or register array 404.

Referring to FIGS. 1, 2, and 4, IRU 400 will be more fully described.Register file 102 includes a temporary buffer 403, a register array 404and selection logic 408. There are two input ports 110 used to transferresults to temporary buffer 403 and register array 404. Control signals(not shown) generated in IEU control logic 200 are used to select theresults in selection multiplexer 108 when the results are ready to bestored in register file 102. Selection multiplexer 108 receives datafrom various functional units and multiplexes this data onto input ports110.

Two input ports 110 for each register file 102 in the preferredembodiment permit two simultaneous register operations to occur. Thus,input ports 110 provide two full register width data values to bewritten to temporary buffer 403. This also permits multiple registerlocations to be written in one clock cycle. The technique of writing tomultiple register address locations in one clock cycle is fullydescribed below.

FIGS. 5A and B illustrate the allocation of temporary buffer 403. FIG.5A shows a configuration of instruction window 202, and FIG. 5B shows anexample ordering of data results in temporary buffer 403. As notedpreviously, there can be a maximum of eight pending instructions at anyone time. Each instruction may require one or two of temporary buffer's403 eight register locations 0 through 7, depending on whether it is aregular-length or a long-word instruction.

The eight pending instructions in instruction window 202 are groupedinto four pairs. The first instructions from buckets 0 and 1 (i.e. I0and I4) are a first pair. The other pairs, I1 and I5, etc., aresimilarly formed. A result of I0 (I0RD) is stored in register location0, and a result of I4 (I4RD) is stored in register location 1. If I0 isa long-word entry, I0RD, the low-word result (result of the first halfof a long-word instruction) is still stored in location 0, but now thehigh-word result (I0RD+1, from the second half of the instruction) isstored in location 1. This means that the low-word result of I4 does nothave a space in temporary buffer 403, and therefore can not be issued atthis time.

Tags are generated in RRC 204 and assigned to each instruction beforethe instruction's results are store in temporary buffer 403. Thisfacilitates easy tracking of results, particularly when instructions areexecuted out of order. Each tag comprises three bits, for example, toindicate addresses for writing the instruction's results in temporarybuffer 403. These three bits are assigned according to the instructionslocations in instruction window 202. The tags are used by the RRC tolocate results in temporary buffer 403 if they are operands for otherinstructions, for example. Table 1 illustrates a representativeassignment for these three tag bits. TABLE 1 Tag Assignment INSTRUCTIONTAG LOCATION 0 000 0 1 010 2 2 100 4 3 110 6 4 001 1 5 011 3 6 101 5 7111 7

Each location in instruction window 202 has a corresponding location intemporary buffer 403. The least significant bit indicates the bucket ininstruction window 202 where the instructions originated. This bit isinterpreted differently when the bucket containing the instructionchanges. For example, when all four instructions of bucket 0 areretired, the instructions in bucket 1 drop into bucket 0. When thisoccurs the LSB (least significant bit) of the tag that previouslyindicated bucket 1, now indicates bucket 0. For example, in Table 1, anLSB of 1 indicates the instructions in bucket 1. When these instructionsare dropped into bucket 0, the LSB will not change and an LSB of 1 willindicate bucket 0. The tag contains information on how to handle eachinstruction.

When the instruction is executed and its results are output from afunctional unit, the tag follows. Three bits of each instruction's taguniquely identify the register location where the results of thatinstruction are to be stored. A temporary write block (not shown) looksat functional units 104, the instruction results and the tags. Eachfunctional unit 104 has 1 bit that indicates if a result is going to beoutput from that functional unit 104 on the next clock cycle. Thetemporary write block gets the tag for each result that will beavailable on the next clock cycle. The temporary write block generatesan address (based on the tag) where the upcoming results are to bestored in temporary buffer 403. The temporary write block addressestemporary buffer 403 via RRC 204 on the next clock cycle when theresults are ready at functional unit 104.

As noted above, a function of the tags is to permit the results of aparticular functional unit 104 can be routed directly to the operandinput of a functional unit 104. This occurs when a register valuerepresents an operand that is needed immediately by a functional unit104. The results can also be stored in register array 404 or temporarybuffer 403.

In addition, the tags indicate to the IEU when to return those resultsdirectly to bypass multiplexers 106 for immediate use by an instructionexecuting in the very next clock cycle. The instruction results may besent to either the bypass multiplexers 106, register file 102, or both.

The results of all instructions executed out of order are stored firstin a temporary buffer 403. As discussed above, temporary buffer 403 haseight storage locations. This number corresponds to the size ofinstruction window 202. In the example discussed above, instructionwindow 202 has eight locations and thus there are up to eight pendinginstructions. Consequently, up to eight instruction results may need tobe stored in temporary buffer 403.

If an instruction is completed in order, that is all previousinstructions are already completed and their results written to registerarray 404, the results of that instruction can be written directly toregister array 404. RCB 409 knows if results can go directly to registerarray 404. In this situation, RCB 409 sets an external write bitenabling a write operation to register array 404. Note, in the preferredembodiment, the results in this situation are still written to temporarybuffer 403. This is done for simplicity.

For each instruction result in temporary buffer 403, when all previousinstructions are complete, without any exceptions or branchmispredictions, that result is transferred from temporary buffer 403 toa register array 404 via selection logic 408. If an instruction iscompleted out of order and previous instructions are not all completed,the results of that instruction remain in temporary buffer 403 until allprevious instructions are completed. If one or more instructions havebeen completed, and they are all awaiting completion of an instructionearlier in the program order, they cannot be retired. However, once thisearlier instruction is completed, the entire group is retirable and canbe retired.

A done block 420 is an additional state machine of the processor. Doneblock 420 keeps track of what instructions are completed and marks theseinstructions ‘done’ using a done flag. The done block informs aretirement control block 409 which instructions are done. The retirementcontrol block 409, containing retirement control circuitry checks thedone flags to see if all previous instructions of each pendinginstruction are completed. When retirement control block 409 is informedthat all instructions previous (in the program order) to the pendinginstruction are completed, the retirement control block 409 determinesthat the pending instruction is retirable.

FIG. 6A is a timing diagram illustrating writing to register array 404,and FIG. 6B is a timing diagram illustrating the transfer of data fromtemporary buffer 403 to register array 404. Referring to FIGS. 4, 6A,and 6B, the technique of writing to register array 404 will bedescribed.

Temporary buffer 403 has four output ports F, G, H, and I that are usedto transfer data to register array 404. Register array 404 has two inputports, A′ and B′, for accepting instruction results from eithertemporary buffer 403 or functional units 104. Write enable signals 602and 604 enable writes to temporary buffer 403 and register array 404,respectively, as shown at 603. Although not illustrated, there areactually 2 write enable signals 604 for register array 404. One of theseenable signals 604 is for enabling writes to input port A′, and theother is for enabling writes to input port B′. Since there are two inputports A′, and B′, two writes to register array 404 can occursimultaneously.

Data to be written to register array 404 can come from either temporarybuffer 403 or functional units 104 (via selection multiplexer 108 andbus 411). Control signal 606 is used to select the data in selectionlogic 408. When control signal 606 is a logic high, for example, data isselected from temporary buffer 403. Signal 410 is the write address,dictating the location where data is to be written in either temporarybuffer 403 or register array 404. Data signal 608 represents the databeing transferred from temporary buffer 403 to register array 404.Alternatively, data signal 608 represents data 110 from functional units104 via selection multiplexer 108.

Register array 404 can write 4 locations in one clock cycle. Address 410and write enable 604 signals are asserted first, then data 608 andcontrol signal 606 are asserted. Control signal 606 is asserted as shownat 605. During the first half of the cycle, registers corresponding toinstructions I0 and I1 will be updated. During the second half of thecycle, registers corresponding to I2 and I3 will be updated. If any ofthe results are long words, the upper half of the word will be updatedduring the second cycle. Thus, two results can be simultaneouslytransferred and two instructions can be simultaneously retired in a halfa clock cycle. A total of four instructions can therefore be retired perclock cycle.

Referring to FIG. 6B, read addresses 612F, 612G, 612H, and 612I areavailable for temporary buffer 403 output ports F through I. Data 614F,614G, 614H, and 614I is available from temporary buffer 403 at thebeginning of the clock cycle, as shown at 615. Addresses 410A aregenerated for input port A′ and 410B are generated for input port B′.Similarly, a write enable signal 604A for input port A′ and a writeenable signal 604B for input port B′ are generated for each half of theclock cycle. Address 410 appearing in the first half of the clock cycle,as shown at 611A and 611B, is the location to which data is writtenduring enable signal 604 appearing in the first half, as shown as 605Aand 605B. Similarly, data is written during the second half of the clockcycle to the address 410 appearing at that time, as shown at 613A and613B. Since data is written to A′ and B′ simultaneously, up to fourinstruction results may be written to register array 404 during oneclock cycle. Therefore, up to four instructions may be retired duringone clock cycle.

Latches in selection logic 408 hold the data constant until theappropriate address 410 is present and write enable signals 604 allowthe data to be written.

The process of transferring a result from temporary buffer 403 toregister array 404, as described above, is called retiring. When aninstruction is retired, it can be considered as officially completed.All instructions previous to that instruction have been completedwithout branch mispredictions or exceptions and the state of the machinewill never have to be redetermined prior to that point. As a result, tothe program running in the processor, it appears that the instructionsare updated and executed sequentially.

Since instructions are being issued and executed out of order,subsequent instructions may require operands corresponding to results(values) in temporary buffer 403 or register array 404. Therefore,access to register values in temporary buffer 403, as well as valuesstored in register array 404 is provided by the present invention.

Read access to temporary buffer 403 and register file 404 is controlledby RRC 204. Such read access is required by instructions executing thatneed results of previously executed instructions. Recall from thediscussion in subsection 2 above that RRC 204 performs data dependencychecking. RRC 204 knows which instructions are dependent on whichinstructions and which instructions have been completed. RRC 204determines if the results required by a particular instruction must begenerated by a previous instruction, i.e. whether a dependency exists.If a dependency exists, the previous instruction must be executed first.An additional step is required, however, when a dependency exists. Thisstep is determining where to look for the results of the instruction.Since RRC 204 knows what instructions have been completed, it also knowswhether to look for the results of those instructions in temporarybuffer 403 or register array 404.

RRC 204 sends a port read address 410 to register array 404 andtemporary buffer 403 to read the data from the correct location ontooutput lines 412. One bit of read address 410 indicates whether thelocation is in temporary buffer 403 or register array 404. Again, seeU.S. Pat. No. 5,497,499, entitled “Superscalar RISC InstructionScheduling” and issued Mar. 5, 1996 for additional disclosure pertainingto the RRC.

In the preferred embodiment of the present invention, each output port Athrough E of temporary buffer 403 and register array 404 has its owndedicated address line. That is, each memory location can be output toany port.

4. Additional Features of the Invention

IRU 200 also informs other units when instructions are retired. IRU 200informs an Instruction Fetch Unit (IFU) when it (the IRU) has changedthe state of the processor. In this manner, the IFU can maintaincoherency with IEU 100. The state information sent to the IFU is theinformation required to update the current Program Counter and torequest more instructions from the IFU. In the example above, when fourinstructions are retired, the IFU can increment the PC by four and fetchanother bucket of four instructions.

An example of the IFU is disclosed in a commonly owned, copendingapplication Ser. No. 07/817,810 titled “High Performance RISCMicroprocessor Architecture.”

In addition, according to a preferred embodiment of the presentinvention, status bits and condition codes are retired in order as well.Each of the eight instructions in instruction window 202 has its owncopy of the status bits and condition codes. If an instruction does notaffect any of the status bits, then it propagates the status bits fromthe previous instruction.

When an instruction is retired, all its status bits have to beofficially updated. If more than one instruction is retired in onecycle, the status bits of the most recent (in order) instruction areused for the update.

5. Conclusion

While various embodiments of the present invention have been describedabove, it should be understood that they have been presented by way ofexample only, and not limitation. Thus, the breadth and scope of thepresent invention should not be limited by any of the above-describedexemplary embodiments, but should be defined only in accordance with thefollowing claims and their equivalents.

1-24. (canceled)
 25. A superscalar processor adapted to execute at leastone instruction out of program order, said superscalar processorcomprising: an instruction window that has a first storage location anda second storage location for storing instructions, an instructionstored in the second storage location being stored in the first storagelocation when an instruction stored in the first storage location isretired; a plurality of functional units that execute an instruction outof program order; a buffer that has storage locations at which anexecution result of each instruction is stored; register renamingcircuitry that associates uniquely an address indicating a fixed storagelocation in the buffer with each instruction included at each storagelocation in the instruction window, regardless of a change in thestorage location of the instruction in the instruction window; aregister array that includes a plurality of array locations referencedso that an execution result of a retired instruction can be provided tothe referenced array location; a retirement control block thatdetermines whether an executed instruction can be retired or not; and aninstruction retirement unit that retires an instruction that can beretired while associating an execution result of each instruction thatcan be retired with an array location within the register array.
 26. Thesuperscalar processor according to claim 25, wherein the instructionwindow includes a first instruction and a second instruction, the firstinstruction appears earlier than the second instruction in programorder, the second instruction is executed prior to execution of thefirst instruction, and an execution result of the second instruction isstored in the buffer before an execution result of the first instructionis stored in the buffer.
 27. The superscalar processor according toclaim 26, wherein the register renaming circuitry associates a firstunique address with the first instruction, and associates a secondunique address with the second instruction, the buffer has a firststorage location and a second storage location, the first unique addressindicates the first storage location of the buffer at which a firstinstruction execution result is stored, and the second unique addressindicates the second storage location of the buffer at which a secondinstruction execution result is stored.
 28. The superscalar processoraccording to claim 27, wherein the first instruction and the secondinstruction are retired approximately simultaneously.
 29. Thesuperscalar processor according to claim 27, wherein the executionresult of the first instruction and the execution result of the secondinstruction are associated with respective register locations of theregister array approximately simultaneously.
 30. The superscalarprocessor according to claim 26, wherein the second instruction is notretired before the execution result of the first instruction is storedin the buffer.
 31. The superscalar processor according to claim 25,wherein instructions are retired in a group that includes at least twoinstructions.
 32. The superscalar processor according to claim 26,wherein a first plurality of instructions includes the firstinstruction, a second plurality of instructions includes the secondinstruction, and the first plurality of instructions appear earlier thanthe second plurality of instructions in program order.
 33. Thesuperscalar processor according to claim 31, wherein the instructionretirement unit is adapted to retire at least two instructions in oneclock cycle.
 34. The superscalar processor according to claim 25,wherein the associating an execution result is an updating of a valuestored at a location of an array, and the array is referenced to providean in-order state of the processor.
 35. An instruction retirement methodof a superscalar processor that executes an instruction out of programorder, said instruction retirement method comprising: receiving a firstinstruction and a second instruction, the first instruction appearingearlier than the second instruction in program order; providing aninstruction window that has a first storage location and a secondstorage location for storing instructions, an instruction stored in thesecond storage location being stored in the first storage location whenan instruction stored in the first storage location is retired assigninga fixed address indicating a first location of a buffer at which a firstinstruction execution result is stored at the time of execution of thefirst instruction, and assigning a fixed address indicating a secondlocation of a buffer at which a second instruction execution result isstored at the time of execution of the second instruction, regardless ofa change in the storage location of the instruction in the instructionwindow; storing the execution result of the second instruction at asecond location in the buffer using the address assigned to the secondinstruction; storing the execution result of the first instruction at afirst location in the buffer using the address assigned to the firstinstruction, the second instruction being executed out of program orderwith respect to the first instruction; determining whether the firstinstruction can be retired; determining whether the second instructioncan be retired; and retiring the first instruction and the secondinstruction approximately simultaneously by associating in an array theexecution result of the first instruction with the second instruction,the execution result of the first instruction and the execution resultof the second instruction being identified from the buffer at the firstlocation and the second location, respectively.
 36. The instructionretirement method according to claim 35, further comprising: receiving afirst plurality of instructions and a second plurality of instructions,the first plurality of instructions appearing earlier than the secondplurality of instructions in program order, the first plurality ofinstructions including the first instruction and the second plurality ofinstructions including the second instruction.
 37. The instructionretirement method according to claim 36, wherein the first instructionand the second instruction retire approximately simultaneously.
 38. Theinstruction retirement method according to claim 35, wherein, in oneclock cycle, the execution result of the first instruction is associatedwith the first location in the array, and the execution result of thesecond instruction is associated with the second location in the array,and the first instruction and the second instruction are retiredapproximately simultaneously.
 39. The instruction retirement methodaccording to claim 38, wherein the determining whether the secondinstruction can be retired depends on the storing of the executionresult of the first instruction in the buffer.
 40. The instructionretirement method according to claim 38, further comprising: determiningthe execution result of the first instruction by referencing the firstlocation in the array.
 41. The instruction retirement method accordingto claim 35, wherein the first instruction and the second instructionare selected from a plurality of instructions.
 42. The instructionretirement method according to claim 41, wherein the locations of thebuffer at which the execution results of the first instruction and thesecond instruction are stored include physical destination.
 43. Theinstruction retirement method according to claim 35, wherein theassociation of an execution result is an updating of a value stored at alocation of an array, and the array is referenced to provide an in-orderstate of the processor.
 44. A computer system having a processor and amemory adapted to store instructions having a program order, saidprocessor comprising: an instruction window that has a first storagelocation and a second storage location for storing instructions, aninstruction stored in the second storage location being stored in thefirst storage location when an instruction stored in the first storagelocation is retired; register renaming circuitry that associatesuniquely an address indicating a fixed storage location in the bufferwith each instruction included at each storage location in theinstruction window, regardless of a change in the storage location ofthe instruction in the instruction window, said register renamingcircuitry associating an address with at least one instruction in oneclock cycle; a buffer coupled to the register renaming circuitry, saidbuffer storing an execution result of an instruction at a locationdescribed by an address associated with each instruction; a plurality offunctional units coupled to the buffer, said plurality of functionalunits executing an instruction out of program order; an array having aplurality of locations each adapted to identify an execution result of aretiring instruction; a control block that determines whether anexecuted instruction can be retired; and an instruction retiring sectioncoupled to the control block circuitry and the array, said instructionretiring section retiring an instruction that can be retired byassociating an execution result of the instruction that can be retiredwith a location in the array, and allowing the execution results of theinstructions that can be retired to be stored at respective particularlocations of the buffer.
 45. The computer system according to claim 44,wherein the instruction window includes a first instruction and a secondinstruction, the first instruction can be executed earlier than thesecond instruction in program order, the second instruction is executedprior to execution of the first instruction, and an execution result ofthe second instruction is stored in the buffer before an executionresult of the first instruction is stored in the buffer.
 46. Thecomputer system according to claim 45, wherein the register renamingcircuitry associates a first address with the first instruction, andassociates a second address with the second instruction, the buffer hasa first location and a second location, the first address identifies thefirst location of the buffer at which the execution result of the firstinstruction is stored, and the second address identifies the secondlocation of the buffer at which the execution result of the secondinstruction is stored.
 47. The computer system according to claim 46,wherein the first instruction and the second instruction are retiredapproximately simultaneously.
 48. The computer system according to claim46, wherein the execution result of the first instruction and theexecution result of the second instruction are associated withrespective locations of the array in one clock cycle.
 49. The computersystem according to claim 48, wherein the second instruction is notretired before the execution result of the first instruction is storedin the first location of the buffer.
 50. The computer system accordingto claim 44, wherein the number of instructions that can concurrently beretired is any of two, three or four.
 51. The computer system accordingto claim 50, wherein the execution result of the instruction is storedat the first location of the buffer in response to an address associatedwith the instruction.
 52. The computer system according to claim 46,wherein a first plurality of instructions includes the firstinstruction, a second plurality of instructions includes the secondinstruction, and the first plurality of instructions appear earlier thanthe second plurality of instructions in program order.
 53. The computersystem according to claim 44, wherein the association of an executionresult is an updating of a value stored at a location of an array, andthe array is referenced to provide an in-order state of the processor.54. An instruction retirement method of a superscalar processor thatexecutes an instruction out of program order, said instructionretirement method comprising: receiving a first instruction and a secondinstruction simultaneously, the first instruction appearing earlier thanthe second instruction in program order; providing an instruction windowthat has a first storage location and a second storage location forstoring instructions, an instruction stored in the second storagelocation being stored in the first storage location when an instructionstored in the first storage location is retired; in one clock cycle,assigning a fixed address indicating a first location of a buffer atwhich a first instruction execution result is stored at a time ofexecution of the first instruction, and assigning a fixed addressindicating a second location of a buffer at which a second instructionexecution result is stored at a time of execution of the secondinstruction, regardless of a change in the storage location of theinstruction in the instruction window; storing an execution result ofthe second instruction at a second location in the buffer using theaddress assigned to the second instruction; storing an execution resultof the first instruction at a first location in the buffer using theaddress assigned to the first instruction, the second instruction beingexecuted out of program order with respect to the first instruction;determining whether the first instruction can be retired; determiningwhether the second instruction can be retired; and retiring the firstinstruction and the second instruction approximately simultaneously bywriting the execution result of the first instruction stored at thefirst location of the buffer into a first location of a register arrayand writing the execution result of the second instruction stored at thesecond location of the buffer into a second location of the registerarray approximately simultaneously.
 55. The instruction retirementmethod according to claim 54, wherein the storing of the executionresult of the second instruction occurs before the storing of theexecution result of the first instruction.
 56. The instructionretirement method according to claim 54, further comprising: receiving athird instruction; determining a third location of the buffer at which athird instruction execution result is stored at the time of execution;storing an execution result of the third instruction at a third locationin the buffer; determining whether the third instruction can be retired,wherein the first instruction, the second instruction, and the thirdinstruction are retired approximately simultaneously by writing theexecution result of the first instruction stored at the first locationof the buffer into the first location of the register array, writing theexecution result of the second instruction stored at the second locationof the buffer into the second location of the register array, andwriting the execution result of the third instruction stored at thethird location of the buffer into a third location of the register arrayapproximately simultaneously.
 57. The instruction retirement methodaccording to claim 54, wherein the receiving of the first instructionand the second instruction includes receiving a first plurality ofinstructions including the first instruction and receiving a secondplurality of instructions including the second instruction, and thefirst plurality of instructions appear earlier than the second pluralityof instructions in program order.
 58. The instruction retirement methodaccording to claim 54, wherein the writing of the execution result ofthe first instruction stored at the first location of the buffer intothe first location of the register array, and the writing of theexecution result of the second instruction stored at the second locationof the buffer into the second location of the register arrayapproximately simultaneously includes writing the execution result ofthe first instruction stored at the first location of the buffer intothe first location of the register array and writing the executionresult of the second instruction stored at the second location of thebuffer into the second location of the register array in one clockcycle.
 59. The instruction retirement method according to claim 54,wherein the execution results stored in the register array indicates anin-order state of a superscalar processor.
 60. A superscalar processorfor executing at least one instruction out of order, said superscalarprocessor comprising: an instruction window that has a first storagelocation and a second storage location for storing instructions, aninstruction stored in the second storage location being stored in thefirst storage location when an instruction stored in the first storagelocation is retired; a buffer for storing an execution result of thefirst instruction among an instruction group and an execution result ofthe second instruction among an instruction group; a superscalarregister renaming circuitry that associates, among a first address and asecond address each of which indicates a fixed storage location in thebuffer, the first address with a first instruction among an instructiongroup and the second address with a second instruction among aninstruction group, regardless of a change in the storage location of theinstruction in the instruction window, the first address associated withthe first instruction indicating a first location of the buffer at whichthe execution result of the first instruction is stored, the secondaddress associated with the second instruction indicating a secondlocation of the buffer at which the execution result of the secondinstruction is stored; a plurality of functional units that execute thefirst instruction and the second instruction out of order among theinstruction group; a register array that has a plurality of registerarray locations for storing an execution result of a retiredinstruction; a retirement control block that determines whether thefirst instruction can be retired and determines whether the secondinstruction can be retired; and an instruction retirement unit thatretires the first instruction and the second instruction by storing theexecution result of the first instruction stored in the first locationof the buffer at a first register array location and storing theexecution result of the second instruction stored in the second locationof the buffer at a second register array location approximatelysimultaneously.
 61. The superscalar processor according to claim 60,wherein the first address is associated with the first instruction amongthe instruction group in one clock cycle, and the second address isassociated with the second instruction among the instruction group inone clock cycle.
 62. The superscalar processor according to claim 60,wherein the execution result of the first instruction stored in thefirst location of the buffer is stored at the first register arraylocation in one clock cycle, and the execution result of the secondinstruction stored in the second location of the buffer is stored at thesecond register array location in one clock cycle.
 63. The superscalarprocessor according to claim 60, wherein the execution result stored ina plurality of the register array locations of the register arrayindicates an in-order state of a superscalar processor.
 64. Thesuperscalar processor according to claim 60, wherein the secondinstruction is not retired before the execution result of the firstinstruction is stored in the buffer.
 65. An instruction retirement unit,comprising: a buffer that stores execution results of instructionsexecuted out of program order, said buffer having a predetermined numberof output ports for outputting the execution results of theinstructions; a register array that stores the execution results of theinstructions in program order, said register array having a number ofinput ports smaller than the predetermined output ports of the buffer;and a retirement execution section that is able to write the executionresults of the instructions larger in number than the number of theinput ports of the register out of the buffer into the register array inone clock cycle.
 66. The instruction retirement unit according to claim65, wherein the retirement execution section is able to write theexecution results of the instructions stored in the buffer into theregister array every half clock cycle.
 67. The instruction retirementunit according to claim 66, wherein the number of the output ports ofthe buffer is four, the number of the input ports of the register arrayis two and the retirement execution section is able to retire a maximumof four instructions in one clock cycle.
 68. An instruction retirementunit, comprising: a buffer that stores execution results of instructionsincluding instructions executed out of program order, said buffer havinga predetermined number of output ports for outputting the executionresults of the instructions; a register array that stores the executionresults of the instructions in program order, said register array havinga number of input ports smaller than the predetermined number of outputports of the buffer; and a retirement execution section able to write anumber of the execution results out of the buffer into the registerarray in one clock cycle, the number of the execution results largerthan the number of the input ports of the register array.
 69. Theinstruction retirement unit according to claim 68, wherein theretirement execution section is able to write the execution results ofthe instructions stored in the buffer into the register array every halfclock cycle.
 70. The instruction retirement unit according to claim 69,wherein the number of the output ports of the buffer is four, the numberof the input ports of the register array is two, and the retirementexecution section is able to retire a maximum of four instructions inone clock cycle.
 71. A superscalar microprocessor, comprising: onefunctional unit or a plurality of functional units that executeinstructions; a buffer that stores execution results of instructionsincluding instructions executed out of program order; an instructionissuing section that judges whether the functional unit is available andissues an instruction to the available functional unit, triggered byreceipt of a tag indicating that the instruction has become executableand indicating a storage location of an operand in the buffer; adependency judgment section that judges whether the execution result ofthe previous instruction earlier in program order is necessary for theexecution of a current target instruction; a completion judgment sectionthat judges whether the previous instruction earlier in program orderhas completed if the execution result of the previous instructionearlier in program order is judged as necessary; and a tag issuingsection that outputs the tag to the instruction issuing section if it isjudged that the previous instruction earlier in program order hascompleted.
 72. A superscalar microprocessor, comprising: an instructionwindow that holds, on a first in first out basis, a plurality ofinstruction groups having a predetermined number of instructions, saidinstruction window having an entry holding a target instruction groupfor execution; a retirement judgment section that judges whether, amongthe plurality of instruction groups, all of the predetermined number ofinstructions included in the target instruction group held by the entryhas been retired; and an instruction window control section thattransfers an instruction group subsequent to the current targetinstruction group to the entry, if it is judged that all of thepredetermined number of instructions included in the target instructiongroup held by the entry has been retired.
 73. A superscalarmicroprocessor adapted to execute at least one instruction out ofprogram order, said superscalar microprocessor comprising: a buffer thatstores execution results of instructions including instructions executedout of program order; a completion flag allocation section thatallocates a completion flag indicating that execution has been completedto at least one instruction whose execution has completed; a registerarray that stores the execution results of the instructions in programorder; a register renaming section that outputs, to the buffer, a tagincluding information that uniquely indicates a location in the bufferat which the execution result of the instruction is stored if it isjudged that an instruction on which an instruction depends has notretired, and outputs a register address holding a result of thedependent instruction to the register array if it is judged that aninstruction on which an instruction depends has retired; and aninstruction retirement control section that judges whether the executionof a pending instruction has completed or not based on the completionflag, and retires instructions that can be retired simultaneously ifthey are judged as retirable.
 74. A computer system comprising: asuperscalar microprocessor according to claim 73; and a memory adaptedto store instructions having program order.
 75. A superscalarmicroprocessor that is able to execute at least one instruction out ofprogram order, said superscalar microprocessor comprising: a functionalsection that executes inputted instructions; a buffer that storesexecution results of instructions including instructions executed out ofprogram order; a register renaming section that uniquely allocates, toan executed instruction, a tag including information that indicates alocation in the buffer at which an execution result of the executedinstruction is stored; a tag determination section that determines anecessary tag for execution of an instruction that will be executedafter the executed instruction; and an execution result sending sectionthat sends the execution result of the executed instruction to thefunctional section if the tag allocated to the executed instructionagrees with the necessary tag.
 76. The superscalar microprocessoraccording to claim 75, further comprising a register array that storesthe execution results of the instructions in program order, wherein theexecution result sending section sends the execution result of theexecuted instruction to the buffer if the tag allocated to the executedinstruction disagrees with the necessary tag.
 77. A superscalarmicroprocessor that is able to execute at least one instruction out ofprogram order, said superscalar microprocessor comprising: a pluralityof functional sections that execute inputted instructions; a buffer thatstores execution results of instructions including instructions executedout of program order; a register renaming section that uniquelyallocates a tag to an instruction executed by the functional section,the tag indicating a storage location in the buffer of the executionresult of the executed instruction; a tag determination section thatdetermines a tag necessary for execution of an instruction that will beexecuted after the executed instruction; and an execution result sendingsection that sends the execution result of the executed instruction toone of the functional section other than the functional section that hasexecuted the instruction if the tag allocated to the executedinstruction agrees with the necessary tag.
 78. A superscalarmicroprocessor that is able to execute at least one instruction out ofprogram order, said superscalar microprocessor comprising: aninstruction window that holds, on a first in first out basis, aplurality of instruction groups having a predetermined number ofinstructions, said instruction window having an entry holding a targetinstruction group for execution; an instruction window control sectionthat transfers an instruction group subsequent to the current targetinstruction group to the entry if all of the predetermined number ofinstructions included in the instruction group held by the entry hasbeen retired; a buffer that stores execution results of instructionsexecuted out of program order; a register renaming section that uniquelyallocates, to the executed instruction, a tag that is associated with aninstruction in the instruction window, said tag including informationthat indicates a same location in the buffer at which an executionresult of the instruction is stored, regardless of transition of theinstruction groups in the instruction window; and an operand acquisitionsection that acquires an operand of the instruction, which is stored inthe instruction window and is dependent on the execution result ofanother instruction, from the buffer by using the tag.